{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Decision Tree\n",
    "\n",
    "### Classification vs Regression\n",
    "\n",
    "Classification is a problem of assigning a label to an **unlabeled example**. In machine learning, the classification problem is solved by a classification learning algorithm that takes **a collection of labeled examples** as inputs and produces a model that can take an unlabeled example as input and either directly **output a label or output a number** that can be used by the data analyst to deduce the label easily.\n",
    "\n",
    "- Binary classification\n",
    "- Multiclass classification\n",
    "- Probabilities -- how\n",
    "\n",
    "### Supervised learning vs Unsupervised learning\n",
    "\n",
    "In supervised learning, the dataset is the collection of labeled examples $\\{(X_i , y_i)\\}^N_{i=1}$. Each element $x_i$ among $N$ is called a **feature vector**. A feature vector is a vector in which each dimension $j = 1, \\dots, D$ contains a value that describes the example. That value is called a feature and is denoted as $x^j$ . The label $y_i$ can be either an element belonging to a finite set of classes ${1, 2, \\dots, C}$, or a real number, or a more complex structure, like a vector, a matrix, a tree, or a graph.\n",
    "\n",
    "The goal of a supervised learning algorithm is to use the dataset to produce a model that takes a feature vector $X_i$ as input and outputs information that allows deducing the label for this feature vector.\n",
    "\n",
    "In **unsupervised learning**, the dataset is a collection of unlabeled examples $\\{(X_i)\\}^N_{i=1}$. Again, $X_i$ is a feature vector, and the goal of an unsupervised learning algorithm is to create a model that takes a feature vector $X_i$ as input and either transforms it into another vector or into a value that can be used to solve a practical problem.\n",
    "- Clustering\n",
    "- Dimensionality reduction\n",
    "- Fault/outlier detection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tree based learning algorithms\n",
    "\n",
    "Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based\n",
    "methods empower predictive models with high accuracy, stability and ease of interpretation.\n",
    "\n",
    "Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand\n",
    "(classification or regression).\n",
    "\n",
    "### Decision Tree\n",
    "Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in **classification problems**.\n",
    "\n",
    "It works for both **categorical and continuous** input and output variables.\n",
    "\n",
    "We aim to split **the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables**.\n",
    "\n",
    "Decision tree identifies the most significant variable and it’s value that gives best homogeneous sets of population. Now the question which arises is, how does it identify the variable and the split? \n",
    "\n",
    "To do this, decision tree uses various algorithms.\n",
    "\n",
    "1. Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it called as categorical variable decision tree. Example: In above scenario of student problem, where the target variable was “Student will play cricket or not” i.e. YES or NO.\n",
    "2. Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as Continuous Variable Decision Tree\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Example:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Questions\n",
    "\n",
    "We need to understand and answer a few more questions related to decision tree.\n",
    "- What are the impurity measures in decision tree?\n",
    "- What are the decision tree algorithms?\n",
    "- When does a decision tree growth stopped?\n",
    "- How to interpret results of a decision tree?\n",
    "\n",
    "### Impurity measure\n",
    "The decision of making strategic splits heavily affects a tree’s accuracy. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous subnodes.\n",
    "\n",
    "#### Gini index\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
